Unsupervised Concept-to-text Generation with Hypergraphs
نویسندگان
چکیده
Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection (“what to say”) and surface realization (“how to say”) in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We represent our grammar compactly as a weighted hypergraph and recast generation as the task of finding the best derivation tree for a given input. Experimental evaluation on several domains achieves competitive results with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.
منابع مشابه
Text-to-Image Generation based on Crossmodal Association with Hierarchical Hypergraphs
In this paper, we propose a novel framework for text-to-image generation based on association between text and image modalities. As an association model, we use hierarchical hypergraphs which consist of two layers including heterogeneous hypergraphs. While the first layer is composed of two hypergraphs: a text hypergraph and an image hypergraph, a hypergraph in the second layer associates two m...
متن کاملA Global Model for Concept-to-Text Generation
Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection (“what to say”) and surface realization (“how to say”) in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free g...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملProbabilistic Approaches for Modeling Text Structure and their Application to Text-to-Text Generation (Invited Talk)
Text-to-text generation aims to produce a coherent text by extracting, combining and rewriting information given in input texts. Examples of its applications include summarization, answer fusion in question-answering and text simplification. At first glance, text-to-text generation seems a much easier task than the traditional generation set-up where the input consists of a non-linguistic repre...
متن کاملUnsupervised Sense Disambiguation Using Bilingual Probabilistic Models
We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchica...
متن کامل